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METHOD AND APPARATUS FOR VOICE TRANSCODING 
BETWEEN VARIABLE RATE CODERS 

BACKGROUND OF THE INVENTION 
[0001] The present invention relates generally to processing of telecommunication signals. 
More particularly, the present invention relates to a method and apparatus for transcoding a 
bitstream encoded by a first voice speech coding format into a bitstream encoded by a second 
variable-rate voice coding format. Merely by way of example, the invention has been applied 
to variable-rate voice transcoding, but it v^ould be recognized that the invention may also be 
applicable to other applications. 

[0002] Telecommunication techniques have progressed through the years. One of the 
major desires of speech coding development is high quality output speech at a low average 
data rate. One approach is to employ a variable bit-rate scheme, whereby the transmission 
rate is not only determined by the network traffic but also from the characteristics of the input 
speech signal. For example, when the signal is highly voiced, a high bit rate may be chosen; 
if the signal is weak, a low bit rate is chosen; and if the signal has mostly silence or 
background noise, a lower bit rate is chosen. This often provides efficient allocation of the 
available bandwidth, without sacrificing output voice quality. Such variable-rate coders 
include the TIA IS- 127 Enhanced Variable Rate Codec (EVRC), and 3rd generation 
partnership project 2 (3GPP2) Selectable Mode Vocoder (SMV). These coders use Rate Set 
1 of the Code Division Multiple Access (CDMA) communication standards IS-95 and 
cdma2000, which include rates of 8.55 kbit/s (Rate 1 or fiill Rate), 4.0 kbit/s (half-rate), 2.0 
kbit/s (quarter-rate) and 0.8 kbit/s (eighth rate). SMV selects the bit rate based on the input 
speech characteristics and operates in one of six network controlled modes, which limit the 
bit rate during high traffic. Depending on the mode of operation, different thresholds may be 
set to determine the rate usage percentages. 

[0003] To accurately decide the desired transmission rate, and obtain high quality output 
speech at that rate, input speech frames are categorized into various classes. For example, in 
SMV, these classes include silence, unvoiced, onset, plosive, non-stationary voiced and 
stationary voiced speech. It is known that certain coding techniques are better suited for 
certain classes of sounds. Also, some types of sounds, for example, voice onsets or 



unvoiced-to-voiced transition regions, have higher perceptual significance and thus generally 
require higher coding accuracy than other classes of sounds, such as unvoiced speech. Thus, 
the speech fi-ame classification may be used, not only to decide the most efficient 
transmission rate, but also the best-suited coding algorithm. 

5 [0004] Accurate classification of input speech fi-ames is desired to fiiUy exploit the signal 
redundancies and perceptual importance. Typical fi-ame classification techniques include 
voice activity detection, measuring the amount of noise in the signal, measuring the level of 
voicing, detecting speech onsets, and measuring the energy in a number of firequency bands. 
These measures generally require the calculation of numerous parameters, such as maximum 
10 correlation values, line spectral fi-equencies, and frequency trmsformations. 

[0005] While coders such as SMV achieve much better quality at lower average data rate 
than existing speech codecs at similar bit rates, the frame classification and rate determination 
algorithms are complex. In the case of a tandem connection of two speech vocoders, 
however, many of the measurements performed for frame classification have akeady been 

15 calculated in the source codec. This can be capitalized on in a transcoding framework. In 
transcoding from the bitstream format of one CELP codec to the bitstream format of another 
CELP codec, rather than fiiUy decoding to PCM and re-encoding the speech signal, smart 
interpolation methods may be applied directly in the CELP parameter space. Hence the 
parameters, such as pitch lag, pitch gain, fixed codebook gain, line spectral frequencies and 

20 the source codec bit rate are available to the destination codec. This allows frame 

classification and rate determination of the destination voice codec to be performed in a fast 
maimer. 

[0006] The simplest method of transcoding is a brute-force approach called tandem 
transcoding, shown in Figure 1 . This method performs a fiiU decode of the incoming 
25 compressed bits to produce synthesized speech. The synthesized speech is then encoded for 
the target standard. This method is undesirable because of the huge amount of computation 
performed in re-encoding the signal, as well as quality degradations introduced by pre- and 
post-filtering of the speech waveform, and the potential delays introduced by the look-ahead- 
requirements of the encoder. 

30 [0007] Methods for "smart" transcoding similar to that illustrated in Figure 2 have 

appeared in the literature. These methods essentially reconstruct the speech signal and then 
perform significant work to extract the various CELP parameters such as line spectral 



2 



jfrequencies and pitch. That is, these methods still operate in the speech signal space. In 
particular, the excitation signal which has already been optimally matched to the original 
speech by the far-end source encoder (encoder that produced the compressed speech 
according to a compression format) is often only used for the generation of the synthesized 
5 speech. The synthesized speech is then used to compute a new optimal excitation. Due to 
the requirement of incorporating impulse response filtering operations in closed-loop 
searches of the excitation parameters, this becomes a very computationally intensive 
operation. 

[0008] Further, these transcoding methods do not cover the transcoding between variable- 
10 rate voice coders which determine the bit rate based on the characteristics of the input speech 
and, in some cases, external commands. During the transcoding process, the fi-ame 
classification and rate decision of the destination voice codec in transcoding are still 
computed through the speech signal domain. The transcoder thus includes the equivalent 
amount of computational resources as the destination codec to classify frame types and to 
15 determine the bit rates. The smart transcoding of previous methods may lose part of their 
computational advantage, as the classification algorithms require parameters from 
intermediate stages of fiinctions that have been omitted. For example, recalculation of the 
line spectral frequencies is often not performed in transcoding, however, the LPC prediction 
gain, LPC prediction error, autocorrelation fimction and reflection coefficients are often 
20 required in the classification and rate determination process. 

[0009] From the above, it is seen that improved telecommunication techniques are desired. 

BRffiF SUMMARY OF THE INVENTION 
[0010] According to the present invention, techniques for processing of telecommunication 
25 signals are provided. More particularly, the present invention relates to a method and 

apparatus for transcoding a bitstream encoded by a first voice speech coding format into a 
bitstream encoded by a second variable-rate voice coding format. Merely by way of 
example, the invention has been applied to variable-rate voice transcoding, but it would be 
recognized that the invention may also be applicable to other applications. 

30 [0011] According to an aspect of the present invention, there is provided a voice 
transcoding apparatus comprising: 



3 



• a first voice compression code parameter unpack module that extracts the input 
encoded bitstream according to the first voice codec standard into its speech 
parameters. In the case of CELP-based codecs, these parameters may be hne 
spectral frequencies, pitch lag, adaptive codebook gains, fixed codebook gains, 
codevectors as well as other parameters; 

• a fi-ame classification and rate determination module that takes the parameters 
fi-om the input encoded bitstream and extemal control commands to generate the 
destination codec fi-ame type and rate decision; 

• at least one parameter interpolator and mapping module that converts the input 
source parameters into destination encoded parameters, taking into account the 
subfi-ame and/or fi-ame size difference between the source and destination codec. 

• a destination parameter packer that converts the encoded parameters into output 
encoded packets; 

• a first stage switching module that connects the source parameter unpack 
module to a parameter interpolator and mapping module; 

• a second stage switching module that connects the destination parameter pack 
module to a parameter interpolator and mapping module; 

• a control engine that controls the selection of parameter timing engine to adapt 
the available resource and signal processing requirement; 

• a status reporting module that provides the status of parameter-based 
transcoding. 

[0012] Numerous benefits are achieved using the present invention over conventional 
techniques. These benefits have been listed below: 

To perform smart voice transcoding between variable-rate voice codecs; 

To classify the destination codec frame type directly from the parameters of input 
source codec frames; 

To determine the rate of the destination codec directly from the parameters of input 
source codec frames; 

To improve voice quality through mapping parameters in the parameter space; 



To reduce the computational complexity of the transcoding process; 

To reduce the delay through the transcoding process; 

To reduce the amount of memory required by the transcoding; and 

To provide a generic transcoding architecture that may be adapted to current and 
5 future variable-rate codecs. 

[0013] Depending upon the embodiment, one or more of these benefits may be achieved. 
These and other benefits are described throughout the present specification and more 
particularly below, 

[0014] Other features and advantages of the present invention will be apparent fi-om the 
10 following description taken in conjxmction with the accompanying drawing, in which like 
reference characters designate the same or similar parts throughout the figures thereof. 

BRffiF DESCRIPTION OF THE DRAWINGS 
[0015] The objectives, features, and advantages of the present invention, which are 
15 believed to be novel, are set forth in detail in the appended claims. The present invention, 
both as to its organization and manner of operation, together with fiirther objectives and 
advantages, may best be understood by reference to the following description, in connection 
with the accompanying drawings. 

[0016] Figure 1 is a simplified block diagram illustrating the general tandem coding 
20 connection to convert a bitstream from one codec format to another codec format; 

[0017] Figure 2 is a simplified block diagram illustrating a general transcoder connection to 
convert a bitstream fi-om one codec format to another codec format without fiiU decode and 
re-encode. 

[0018] Figure 3 is a simplified block diagram illustrating the encoding processes performed 
25 in a variable-rate voice encoder. 

[0019] Figure 4 is a simplified block diagram of the variable-rate voice codec transcoding 
according to an embodiment of the present invention based on a smart firame classification 
and rate determination method. 
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[0020] Figure 5 is a simplified flowchart of the steps performed in the variable-rate voice 
codec transcoding according to an embodiment of the present invention based on a smart 
frame classification and rate determination method 

[0021] Figure 6 is a simphfied diagram of a smart frame classification and rate 
5 determination classifier according to an embodiment of the present invention. 

[0022] Figure 7 is a simphfied block diagram illustrating the frame classification and rate 
determination in a variable-rate encoder according to an embodiment of the present 
invention, 

[0023] Figure 8 illustrates the various stages of frame classification in a variable-rate voice 
10 encoder according to an embodiment of the present invention. 

[0024] Figure 9 is a shnplified block diagram illustrating a first set of CELP parameters for 
an active frame being transformed to a second set of CELP parameters according to an 
embodiment of the present invention. 

[0025] Figure 10 is a simplified block diagram illustrating a first set of CELP parameters 
15 for a silence or noise-like frame being transformed to a second set of CELP parameters 
according to an embodiment of the present invention. 

[0026] Figure 1 1 is a simplified block diagram illustrating the decoding process performed 
in a RCELP-based voice decoder according to an embodiment of the present invention. 

[0027] Figure 12 illustrates the various stages of voice signal pre-processing in a variable 
20 rate voice encoder according to an embodiment of the present invention. 

[0028] Figure 13 is a simphfied block diagram illustrating the subframe excitation 
encoding process performed in a RCELP-based voice encoder according to an embodiment of 
the present invention. 

[0029] Figure 14 is a simplified block diagram illustrating the subframe excitation 
25 encoding process performed in another RCELP-based voice encoder according to an 
embodiment of the present invention. 

[0030] Figure 15 is a simphfied block diagram illustrating an embodiment of the subframe 
excitation transcoding process according to the present invention according to an 
embodiment of the present invention. 
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[0031] Figure 16 is a simplified flowchart showing the steps of an embodiment of the 
subfi-ame excitation transcoding process according to an embodiment of the present 
invention. 

[0032] Figure 17 is a simplified block diagram illustrating the voice transcoding procedure 
5 firom EVRC to SMV according to an embodiment of the present invention. 

[0033] Figure 18 is a simplified block diagram illustrating the voice transcoding procedure 
firom SMV to EVRC according to an embodiment of the present invention. 

[0034] Figure 19 is a simplified diagram illustrating the subfirame size and fi-ame size of 
different fi-ame types and different rates in the SMV voice coder according to an embodiment 
10 of the present invention. 

[0035] Figure 20 is a simphfied diagram illustrating the subfi-ame size and frame size of 
different rates in the EVRC voice coder according to an embodiment of the present invention. 

DETAILED DESCRIPTION OF THE INVENTION 
1 5 [0036] According to the present invention, techniques for processing of telecommunication 
signals are provided. More particularly, the present invention relates to a method and 
apparatus for transcoding a bitstream encoded by a first voice speech coding format into a 
bitstream encoded by a second variable-rate voice coding format. Merely by way of 
example, the invention has been applied to variable-rate voice transcoding, but it would be 
20 recognized that the invention may also be appUcable to other appUcations. 

[0037] A method and apparatus of the invention are discussed in detail below. In the 
following description, for purposes of explanation, numerous specific details are set forth in 
order to provide a thorough understanding of the present invention. The case of SMV and 
EVRC are used for the purpose of illustration and for examples. The methods described here 
25 are generic and apply to the transcoding between any pair of linear prediction-based voice 

codecs. A person skilled in the relevant art will recognize that other steps, configurations and 
arrangements can be used without departing from the spirit and scope of the present 
invention. 

[0038] A block diagram of a tandem connection between two voice codecs is shown in 
30 Figure 1. Altematively a transcoder may be used, as shown in Figure 2, which converts the 
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bitstream from a source codec to the bitstream of a destination codec without fiilly decoding 
the signal to PCM and then re-encoding the signal. The present invention is a transcoder 
between voice codecs, whereby the destination codec is a variable bit-rate voice codec that 
determines the bit-rate based on the input speech characteristics. A block diagram of the 
encoder of a variable bit-rate voice coder is shown in Figure 3. The input speech signal 
passes through several processing stages including pre-processing, estimation of model 
parameters and computation of classification features. Then, a rate, and in some cases, a 
frame type, is determined based on the features detected. Depending on the rate decision, a 
different strategy may be used in the encoding process. Once coding is complete, the 
parameters are packed in the bitstream. 

[0039] A diagram of the apparatus for transcoding between two variable bit-rate voce 
codecs of the present invention is shown in Figure 4. The apparatus comprises a source 
codec impacking module, an intermediate parameters interpolation module, a smart frame 
classification and rate determination module, several mapping strategy modules, a switching 
module to select the desired mapping strategy, a destination packet formation module, and a 
second switching module that links the mapping strategy to the destination packet formation 
module. The method for transcoding between two variable bit-rate voce codecs is shown in 
Figure 5. 

[0040] Firstly, the bitstream representing frames of data encoded according to the source 
voice codec is unpacked and unquantized by a bitstream unpacking module. The actual 
parameters extracted from the bitstream depend on the source codec and its bit rate, and may 
include line spectral frequencies, pitch delays, delta pitch delays, adaptive codebook gains, 
fixed codebook shapes, fixed codebook gains and frame energy. Particular voice codecs may 
also transmit information regarding spectral transition, interpolation factors, the switch 
predictor used as well as other minor parameters. The imquantised parameters are passed to 
the intermediate parameters interpolation module. 

[0041] The intermediate parameters interpolation module interpolates between different 
frame sizes, subframe sizes and sampling rates. This is required if there are differences in the 
frame size or subframe size of the source and destination codecs, in which case the 
transmission frequency of parameters may not be matched. Also, a difference in the 
sampling rate between the source codec and destination codec requires modification of 
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parameters. The output interpolated parameters are passed to the smart frame classification 
and rate determination module and one of the mapping modules. 

[0042] The frame classification and rate determination module receives the unquantized 
interpolated parameters of the source codec and the extemal control commands of the 
5 destination codec, as shown in Figure 6. The frame classification and rate determination 
module comprises a classifier input parameter selector, for selecting which inputs will be 
used in the classification task, M sub-classifiers, buffers to store past input parameters and 
past output values, and a final decision module. The classifier takes as input the selected 
classification input parameters, extemal commands, and past input and output values, and 

10 generates as output the frame class and rate decision for the destination codec. Once 

classification has been performed, the states of the data buffers storing past parameter values 
are updated. The output rate and frame type decision controls the first switching module that 
selects the parameter mapping module, and the second switching module that links the 
parameter mapping module to the bitstream packing module, frame classification is 

15 performed according to pre-defined coefficients or rules determined during a prior training or 
classifier construction process. Several types of classification techniques may be used, 
including but not exclusive to, decision trees, rule-based models, and artificial neural 
networks. The fimctions for computing classification features and the many steps of the 
classification procedure for a particular codec are shown in Figure 7 and Figure 8 

20 respectively. In an embodiment of the present invention, the frame classification and rate 
determination module replaces the standard classifier of the destination codec, as well as the 
processing fimctions of the destination codec required to generate the classification 
parameters. 

[0043] The intermediate parameters interpolation module and the frame classification and 
25 rate determination module are linked to one of many parameter mapping modules by a 

switching module. The destination codec frame type and bit rate determined by the frame 
classification and rate determination module control which mapping module is to be chosen. 
Mapping modules may exist for each combination of bit-rate and frame class of the source 
codec to each bit rate and frame class of the destination codec. 

30 [0044] Each mapping module comprises a speech spectral parameter mapping unit, an 

excitation mapping unit, and a mapping strategy decision unit. The speech spectral parameter 
mapping unit maps the spectral parameters, usually line spectral pairs (LSPs) or line spectral 
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frequencies (LSFs), of the source codec, directly to the spectral parameters of the destination 
codec. A calibration factor is calculated and used to calibrate the excitation to account for the 
differences in the quantised spectral parameters of the source and destination codec. The 
excitation mapping unit takes CELP excitation parameters including pitch lag, adaptive 
codebook gain, fixed codebook gain and fixed codebook codevectors from the interpolator 
and maps these to encoded CELP excitation parameters according to the destination codec. 
Figure 9 shows a mapping module which may be selected for mapping parameters of an 
active speech frame, e.g., mapping from Rate Vz or Rate 1 of EVRC to Rate I/2 or Rate 1 of 
SMV. In this case, the input parameters to the excitation coding mapping unit are the 
adaptive codebook lag, adaptive codebook gain, fixed codebook codevector and fixed 
codebook gain of the source codec. The output parameters to the excitation coding mapping 
unit are the adaptive codebook lag, adaptive codebook gain, fixed codebook codevector and 
fixed codebook gain in the format of the destination codec. Figxire 10 shows a mapping 
module which may be selected for mapping parameters of a silence or noise-Uke speech 
frame, e.g., mapping from Rate 1/8 of EVRC to Rate or Rate 1/8 of SMV. In this case, the 
input parameters to the excitation coding mapping unit are typically the frame energy or 
subframe energies, and excitation shape. Not all excitation parameters shown in the figures 
may be present for a given codec or bit rate. 

[0045] Linked to the excitation coding mapping unit is a mapping strategy decision unit, 
which controls the type of excitation mapping to be used. Several mapping approaches may 
be used, including those using direct mapping from source codec to destination codec without 
any further analysis or iterations, analysis in the excitation domain, analysis in the filtered 
excitation domain or a combination of these strategies, such as searching the adaptive 
codebook in the excitation space and fixed codebook in the filtered excitation space. The 
mapping strategy decision module determines which mapping strategy is to be applied. The 
decision may be based on available computational resources or minimum quality 
requirements and can change in a dynamic fashion. 

[0046] Except for the direct mapping strategy, in which parameters are directly mapped 
from source codec format to destination codec format without any analysis, the excitation 
signal is reconstructed. Reconstruction of the excitation during active speech requires the 
interpolated excitation parameters of pitch delays, adaptive codebook gains, fixed codebook 
shapes, and fixed codebook gains. During silence or noise, the parameters required are the 
signal energy, signal shape if available, and a random noise generator. Figure 1 1 shows a 
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block diagram the decoding process performed in a RCELP-based voice decoder. In this 
figure, the linear prediction (LP) excitation is formed by combining the gain-scaled 
contributions of the adaptive and fixed codebooks, and then fihered by the speech synthesis 
filter and post-filter. In the transcoder architecture of the present invention, to reduce 
complexity and quaUty degradations, the final source codec decoder operations of filtering 
the LP excitation signal by the synthesis filter to convert to the speech domain and then post- 
fihering to mask quantization noise are not used. Similarly, the pre-processing operations in 
the encoder of the destination codec are not used. An example of a speech pre-processor is 
shown in Figure 12. High-pass filtering is a common pre-processing step in existing CELP- 
based voice codecs, with the advanced steps of silence enhancement, noise suppression and 
adaptive tilt filtering being applied in more recent voice codecs. In the case where the source 
codec does not use noise suppression and the destination codec does use noise suppression, 
the transcoder architecture should provide noise suppression fimctionality. 

[0047] Current variable-rate voice codecs applicable to the present invention include 
EVRC and SMV which are based on the Relaxed CELP (RCELP) principle. Typical 
excitation quantization in RCELP codecs is performed by the technique shown in Figure 13 
and Figure 14. In this case, the target signal is modified weighted speech. The modification 
is performed to create a signal with a smooth interpolated pitch delay contour by time- 
warping or time-shifting pitch pulses. This allows for coarse pitch quantization. The 
adaptive codebook is mapped to the delay contour and then searched by gain-adjusting and 
filtering each candidate vector by the weighted synthesis fiher and comparing the result to the 
target signal. Once the best adaptive codebook vector is found, its contribution is subtracted 
fi-om the target, and the fixed codebook is searched in a similar manner. In the case where 
both source and destination codecs are based on the RCELP principle, the computationally 
expensive operation of detecting and shifting each pitch pulse in the encoder processing of 
the destination codec is not required. This is due to the fact that the reconstructed source 
excitation already follows the interpolated pitch track of the source codec. Hence, the target 
signal in the transcoder is not modified weighted speech, but simply the weighted speech, 
speech, weighted excitation, excitation, or calibrated excitation signal. 

[0048] Figure 15 shows a block diagram of an example of one mapping strategy of the 
transcoder between variable-rate voice codecs of the present invention. The procedure is 
outlined in Figure 16. In this case, the mapping strategy chosen is a combination between 
analysis in the excitation domain and analysis in the filtered excitation domain. The target 
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signal for the adaptive codebook search is the cahbrated excitation signal. The search of the 
adaptive codebook is performed in the excitation domain. This reduces complexity as each 
candidate codevector does not need to be filtered with the weighted synthesis filter before it 
can be compared to a speech domain target signal. The initial estimate of the pitch lag is the 
pitch lag obtained fi-om the interpolation module that has been interpolated to match the 
subfirame size of the destination codec. The pitch is searched within a small interval of the 
initial pitch estimate, at the accuracy (integer or fractional pitch) required by the destination 
codec. The adaptive codebook gain is then determined for the best codevector and the 
adaptive codevector contribution is removed from the calibrated excitation. The result is 
filtered using a special weighting filter to produce the target signal for the fixed codebook 
search. The fixed codebook is then searched, either by a fast technique or by gain-adjusting 
and filtering candidate codevectors by the special weighting filter and comparing the result 
with the target. Fast search methods may be appUed for both the adaptive and fixed 
codebook searches. 

[0049] Another mapping strategy is to perform both the adaptive codebook and fixed 
codebook searches in the excitation domain. A fiirther mapping strategy is to perform both 
the adaptive codebook and fixed codebook searches in the filtered excitation domain. 
Altematively, parameters may be directly mapped from source to destination codec format 
without any searching. It is noted that any combinations of the above strategies may also be 
used. The best strategy in terms of both high quality and low complexity will depend on the 
source and destination codecs and bit rates. 

[0050] A second-stage switching module links the interpolation and mapping module to the 
destination bitstream packing module. The destination bitstream packing module packs the 
destination CELP parameters in accordance with the destination codec standard. The 
parameters to be packed depend on the destination codec, the bit rate and frame type. 

EVRC c» SMV TRANSCODING EXAMPLE 

[0051] As an example, it is assumed that the source codec is the Enhanced Variable Rate 
Codec (EVRC) and the destination codec is the Selectable Mode Vocoder (SMV). 

[0052] EVRC and SMV are both variable-rate codecs that determine the bit rate based on 
the characteristics of the input speech. These coders use Rate Set 1 of the Code Division 
Multiple Access communication standards IS-95 and cdma2000, which consists of the rates 
8.55 kbit/s (Rate 1 or fiiU Rate), 4.0 kbit/s (Rate V2 or half-rate), 2.0 kbit/s (Rate Vi or quarter- 



12 



rate) and 0.8 kbit/s (Rate 1/8 or eighth rate). EVRC uses Rate 1, Rate K2, and Rate 1/8; it 
does not use quarter-rate. SMV uses all four rates and also operates in one of six network 
controlled modes. Modes 0 to 6, which limits the bit rate during high traffic. Modes 4 and 5 
are half-rate maximum modes. Depending on the mode of operation, different thresholds 
may be set to determine the rate usage percentages. 

[0053] A diagram of the apparatus for transcoding from EVRC to SMV is shown in Figure 
17. The apparatus comprises an EVRC unpacking module, an intermediate parameters 
interpolation module, a smart SMV frame classification and rate determination module, 
several mapping modules to map parameters from all allowed rate and type transcoder 
transitions, and a SMV packet formation module. The inputs to the apparatus are the EVRC 
frame packets and SMV extemal commands (e.g. network-controlled mode, half-rate max 
flag), and the outputs are the SMV frame packets. Similarly, the apparatus for transcoding 
from SMV to EVRC is shown in Figure 18. The apparatus comprises a SMV unpacking 
module, an intermediate parameters interpolation module, an EVRC rate determination 
module, several mapping modules to map parameters from all allowed rate and type 
transcoder transitions, and an EVRC packet formation module. The inputs to the apparatus 
are the SMV frame packets and EVRC extemal commands (e.g. half-rate max flag), and the 
outputs are the EVRC frame packets. 

[0054] In transcoding from EVRC to SMV, the bitstream representing frames of data 
encoded according to EVRC is unpacked by a bitstream xmpacking module. The actual 
parameters from the bitstream depend on the EVRC bit rate and include line spectral 
frequencies, spectral transition indicator, pitch delay, delta pitch delay, adaptive codebook 
gain, fixed codebook shapes, fixed codebook gains and frame energy. The unquantised 
parameters are passed to the intermediate parameters interpolation module. 

[0055] The intermediate parameter interpolation module interpolates between the different 
subframe sizes of EVRC and SMV. EVRC has 3 subframes per frame, whereas SMV has 1, 
2, 3, 4, or 10 subframes per frame depending on the bit rate and frame type. Depending on 
the parameter and coding strategy, subframe interpolation may or may not be required. Figure 
19 and Figure 20 illustrate the frame and subframe sizes for the different rates and frame 
types of SMV and EVRC respectively. Since the frame size of both codecs is 20ms and the 
sampling rate of both codecs is 8kHz, no frame size or sampling rate interpolation is required. 
The output interpolated parameters, or if no interpolation was carried out, the EVRC CELP 
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parameters, are passed to the smart frame classification and rate determination module and 
the selected of the mapping module. 

[0056] The frame classification and rate determination module receives the EVRC CELP 
parameters, the EVRC bit rate, the SMV network-controlled mode and any other SMV 
5 external commands. The frame classification and rate determination module produces a frame 
class and rate decision for SMV based on these inputs. The frame classification and rate 
determination module comprises a classifier input parameter selector, for selecting which of 
the EVRC parameters will be used as inputs to the classification task, M sub-classifiers, 
buffers to store past input parameters and past output values and a final decision module. 

10 The sub-classifiers take as input the selected classification input parameters, the SMV 

network-controlled mode command, and past input and output values, and generate the frame 
class and rate decision. One sub-classifier may be used to determine the bit rate, and a 
second sub-classifier may be used to determine the frame class. The SMV frame class is 
either silence, noise-like, imvoiced, onset, non-stationary voiced or stationary voiced, and the 

15 SMV rate may be Rate 1, Rate Vi, Rate Va, or Rate 1/8. The SMV frame classification, using 
EVRC parameters, is performed according to a pre-defined configuration and classifier 
algorithm. The coefficients or rules of the classifier are determined during a prior EVRC-to- 
SMV classifier training or construction process. The frame classification and rate 
determination module includes a final decision module, that enforces all SMV rate transition 

20 rules to ensure illegal rate transitions are not allowed. For example, in SMV, a Rate 1 Type 1 
cannot follow a Rate 1/8 frame. This frame classification and rate determination module 
replaces the SMV standard classifier, which requires a large amoimt of processing to derive 
the parameters and features required for classification. The SMV frame-processing fimctions 
are shown in Figure 7, and the many steps of the SMV classification procedure are shown in 

25 Figure 8. These fiinctions are not necessary in the present invention as the already available 
EVRC CELP parameters are used as inputs to classifier module. 

[0057] The intermediate parameters interpolation module and the SMV smart frame 
classification and rate determination module are linked to one of many interpolation and 
mappmg modules by a switching module. EVRC has a single processing algorithm for each 
30 rate, whereas SMV has two possible processing algorithms for each of Rate 1 and Rate Vi, 

and a single processing algorithm for each of Rate Va and Rate 1/8. The SMV frame type and 
bit rate determined by the frame classification and rate determination module control which 
interpolation and mapping module is to be chosen. For Rates 1 and V2 of SMV, the stationary 
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voiced frame class uses subframe processing Type 1 and all other frame classes use subframe 
processing Type 0. As shown in Figure 17, there are interpolation and mapping modules for 
each allowed EVRC rate and SMV type and rate combination. For example, interpolation 
and mapping modules include: 

EVRC Rate 1 to SMV Rate I Type 0 

EVRC Rate 1 to SMV Rate 1 Type 1 

EVRC Rate to SMV Rate 1 Type 0 

EVRC Rate '/a to SMV Rate 1 Type 1 

EVRC Rate V% to SMV Rate 'A Type 0 

EVRC Rate Vi to SMV Rate Vt. Type 1 

and so on. 

[0058] For the EVRC-to-SMV transcoder, interpolation and mapping modules include: 
SMV Rate 1 Type 0 to EVRC Rate 1 
SMV Rate 1 Type 1 to EVRC Rate 1 
SMV Rate 1 Type 0 to EVRC Rate 
SMV Rate 1 Type 1 to EVRC Rate Vz 
SMV Rate V^ Type 0 to EVRC Rate 
SMV Rate Vi Type 1 to EVRC Rate 

and so on. 

[0059] Each mapping module comprises a speech spectral parameter mapping unit, an 
excitation mapping imit, and a mapping strategy decision unit. The speech spectral parameter 
mapping unit maps the EVRC line spectral frequencies directly to SMV line spectral 
frequencies. This occurs for all source EVRC bit rates. The parameters passed to the 
excitation mapping unit depend on the source EVRC bit rate. For EVRC Rates 1 and V^, the 
input CELP excitation parameters are the pitch lag, delta pitch lag (Rate 1 only), adaptive 
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codebook gain, fixed codevectors, and fixed codebook gain. For EVRC Rate 1/8, typically 
inactive fi-ames, the input excitation parameter is the fi-ame energy. The excitation 
parameters are mapped to SMV excitation parameters, depending on the selected mapping 
module and mapping strategy. The mapping strategy decision module controls the mapping 
strategy to be used. In this example, the mapping strategy for active speech is to perform 
analysis in the excitation domain. 

[0060] Using the EVRC excitation parameters of pitch delay, delta pitch delay, adaptive 
codebook gain, fixed codevectors, fixed codebook gains and fi-ame energy, the excitation 
signal is reconstmcted. To reduce complexity and quality degradations, the EVRC decoder 
operations of filtering the excitation signal by the synthesis filter to convert to the speech 
domain and post-filtering are not used. Similarly, the pre-processing operations of SMV are 
not used. These include silence enhancement, high-pass filtering, noise suppression and 
adaptive tilt filtering. Since the EVRC encoder contains noise-suppression operations, the 
transcoder does not include fiirther noise-suppression fiinctions. 

[0061] In RCELP-based coders like EVRC and SMV, a fundamental part of the signal 
processing is in the modification of the speech to match an interpolated pitch track. This 
saves quantisation bits required for pitch representation, but involves a large amount of 
computation as pitch pulses must be detected and individually shifted or time-warped. For 
the EVRC-to-SMV transcoding example, the signal modification fimctions within the SMV 
encoder may be bypassed. This is due to the fact that similar signal modification has already 
been performed in the EVRC encoder. Hence the reconstructed excitation signal already 
possesses a smooth pitch characteristic and is already in a form amenable to efficient 
quantization. The target signal for the adaptive codebook search is thus the excitation signal, 
without pitch modifications, that has been calibrated to account for differences between the 
quantized EVRC LSFs and the quantized SMV LSFs. 

[0062] Mapping of excitation parameters is performed as described in the previous section. 
SimpUfications can be made to the fixed codebook search, as SMV contains multiple sub- 
codebooks for each rate and fi-ame type. Since the EVRC bit rate, fixed codevector and fixed 
codebook structure are known, it may not be necessary to search all sub-codebooks to best 
match target excitation. Instead, each mapping module may contain a single fixed sub- 
codebook or a subset of the fixed sub-codebooks to reduce computational complexity. 
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[0063] A second-stage switching module links the interpolation and mapping module to the 
SMV bitstream packing module. The bitstream is packed according to the SMV frame type 
and bit rate. One SMV output frame is produced for each EVRC input frame. 

OTHER CELP TRANSCODERS 

[0064] The invention of method and apparatus for voice transcoding between variable rate 
coders described in this docmnent is generic to all linear prediction-based voice codecs, and 
appUes to any voice transcoders between the existing codecs G.723.1, GSM-AMR, EVRC, 
G.728, G.729, G.729A, QCELP, MPEG-4 CELP, SMV, AMR-WB, VMR and all other 
future voice codecs. The invention appHes especially to those transcoders, in which the 
destination coder makes use of rate determination and/or frame classification information. 

[0065] The previous description of the preferred embodiment is provided to enable any 
person skilled in the art to make or use the present invention. The various modifications to 
these embodiments will be readily apparent to those skilled in the art, and the generic 
principles defined herein may be applied to other embodiments without the use of the 
inventive faculty. Thus, the present invention is not intended to be limited to the 
embodiments shown herein but is to be accorded the widest scope consistent with the 
principles and novel features disclosed herein. 
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